1 Introduction

Abbreviation list is an obligatory part of linguistic articles that nobody reads. These lists contain definitions of abbreviations used in the article (e. g. corpora names or sign language names), but also a list of linguistic glosses — abbreviations used in linguistic interlinear examples. There is a standardized list of glossing rules (Comrie, Haspelmath, and Bickel 2008) which ends with a list of 84 standard abbreviations. Much bigger list is present on the Wikipedia page. However researchers can deviate from those lists and provide their own abbreviations.

The worst abbreviation list that I have found in a published article make it clear that there is a room for improvement:

NOM = nominative, GEN = nominative, DAT = nominative, ACC = accusative, VOC = accusative, LOC = accusative, INS = accusative, PL = plural, SG = singular

Except obvious mistakes in this list there are some more problems that I want to emphasize:

  • lack of the alphabetic order;
  • there is also some abbreviation (sbjv, imp) in the article that are absent in the abbreviation list.

The main goal of the lingglosses package is to provide an option for creating:

  • linguistic glosses for .html output of rmarkdown (Xie, Allaire, and Grolemund 2018)1;
  • semi-automatic compiled abbreviation list.

For the moment the package is available only from github, so in order to install it you need to run the following commands:

install.packages("remotes")
remotes::install_github("agricolamz/lingglosses")

In order to use the package you need to load it with the library() call:

library(lingglosses)

2 Create glossed examples with gloss_example()

2.1 Basic usage

The main function of the lingglosses package is gloss_example(). This package has the following arguments:

  • transliteration;
  • glosses;
  • free_translation;
  • comment;
  • orthography2;
  • line_length.

Except the last one all arguments are self-exploratory.

gloss_example(transliteration = "bur-e-**ri** c'in-ne-sːu",
              glosses = "fly-NPST-**INF** know-HAB-*NEG*",
              free_translation = "I cannot fly. (Zilo Andi, East Caucasian)",
              comment = "(lit. do not know how to)",
              orthography = "Бурери цIиннессу.")
Бурери цIиннессу.
bur-e-ri c’in-ne-sːu
fly-npst-inf know-hab-neg
(lit. do not know how to)
‘I cannot fly. (Zilo Andi, East Caucasian)’

In this first example you can see that:

  • the transliteration line is italic by default (if you do not want it, just add the transliteration_italic = FALSE argument);
  • users can use standrad markdown syntax (e. g. **a** for bold and *a* for italic);
  • the free translation line is framed with quotation marks.

Since function arguments’ names are optional in R, users can omit writing them as far as they follow the order of the arguments (you can always find the correct order in ?gloss_example):

gloss_example("bur-e-**ri** c'in-ne-sːu",
              "fly-NPST-**INF** know-HAB-_NEG_",
              "I cannot fly. (Zilo Andi, East Caucasian)",
              "(lit. do not know how to)")
bur-e-ri c’in-ne-sːu
fly-npst-inf know-hab-neg
(lit. do not know how to)
‘I cannot fly. (Zilo Andi, East Caucasian)’

It is possible to number and call your examples using strandard rmarkdown tool for generating lists (@):

(@) my first example
(@) my second example
(@) my third example

renders as:

  1. my first example
  2. my second example
  3. my third example

In order to reference examples in the text you need to give them some names:

(@my_ex) example for the referencing
  1. example for the referencing

With names settled you can reference example (4) in the text using the following code (@my_ex).

So this kind of example referencing can be used with lingglosses examples like in (5) and (6). The only important details are:

  • change your code chunk argument to echo = FALSE (or specify it for all code chunks with the following comand in the begining of the document knitr::opts_chunk$set(echo = FALSE"));
  • do not put an empty line between reference line (with (@...)) and the code chunk with lingglosses code.
  1. bur-e-ri c’in-ne-sːu
    fly-npst-inf know-hab-neg
    (lit. do not know how to)
    ‘I cannot fly. (Zilo Andi, East Caucasian)’
  2. Zilo Andi, East Caucasian
    bur-e-ri c’in-ne-sːu
    fly-npst-inf know-hab-neg
    (lit. do not know how to)
    ‘I cannot fly.’

Sometimes people gloss morpheme by morpheme (this is especially useful for polysynthetic languages). It is also possible in lingglosses (and you can annotate slots with orthography argument, see footnote 2 for the details):

  1. Abaza, West Caucasian (Arkadiev and Lander 2020: example 5.2)
gloss_example("s- z- á- la- nəq'wa -wa -dzə -j -ɕa -t'",
              "1SG.ABS POT 3SG.N.IO LOC pass IPF LOC 3SG.M.IO seem(AOR) DCL",
              "It seemed to him that I would be able to pass there.")
s- z- á- la- nəq’wa -wa -dzə -j -ɕa -t’
1sg.abs pot 3sg.n.io loc pass ipf loc 3sg.m.io seem(aor) dcl
‘It seemed to him that I would be able to pass there.’

2.2 Multiline examples

Sometimes examples are to long and do not fit into the page. In that case you need to add argument results='asis' to your chunk and gloss_example() will automatically split your example into multiple rows.

  1. Mishlesh Tsakhur, East Caucasian (Maisak and Tatevosov 2007: 386)
gloss_example('za-s jaːluʁ **wo-b** **qa-b-ɨ**; turs-ubɨ qal-es-di ǯiqj-eː jaːluʁ-**o-b** **qa-b-ɨ**', 
               '1SG.OBL-DAT shawl.3 AUX-3 PRF-3-bring.PFV woolen_sock-PL NPL.bring-PL-A.OBL place-IN shawl.3-AUX-3 PRF-3-bring.PFV',
               '(they) **brought** me a shawl; instead of (lit. in place of bringing) woolen socks, (they) **brought** a shawl.',
               '(Woolen socks are considered to be more valuable than a shawl.)')
za-s jaːluʁ wo-b qa-b-ɨ; turs-ubɨ qal-es-di
1sg.obl-dat shawl.3 aux-3 prf-3-bring.pfv woolen_sock-pl npl.bring-pl-a.obl
ǯiqj-eː jaːluʁ-o-b qa-b-ɨ
place-in shawl.3-aux-3 prf-3-bring.pfv
(Woolen socks are considered to be more valuable than a shawl.)
‘(they) brought me a shawl; instead of (lit. in place of bringing) woolen socks, (they) brought a shawl.’

If you are not satisfied with the result of automatic split you can change value of the line_length argument (the default value is 70, that means 70 characters of the longest line).

2.3 In-text examples

When an example is small, author may do not want to put it in a separate paragraph, but rather prefer to keep it within the text. It is possible to achieve using standard for rmarkdown inline code. The result of the R code can be inserted into the rmarkdown document by using backtick symbol and small r, for example `r 2+2` will be rendered as 4. Currently lingglosses can not automatically detect, whether code provided via code chunk or inline. So if you want to use in-text glossed example and want them to appear in the glosses list, it is possible to write them using the gloss_example() with the intext = TRUE argument. Here is a Turkish example from (DeLancey (1997)): Kemal gel-miş (Kemal come-mir) that was produced with the following inline code:

`r gloss_example("Kemal gel-miş", "Kemal come-MIR", intext = TRUE)`

In the third section I show how to create a semi-automatic compiled abbreviation list for your document. As an example I provide the abbreviation list for this exact document. Even though the mir gloss appears only in this exact section in the in-text example above, it appears in the gloss lists presented in the third section.

2.4 Stand-alone glosses

Sometimes glosses are used in text without any example, e. g. in the table or in the text. So if you want to use in-text glosses and want them to appear in the glosses list, it is possible to write them using the add_gloss() function. As an example I adapted the verbal inflection paradigm of Andi (East Caucasian) from the Table 2 (Verhees 2019: 199):

aff neg
aor -∅ -sːu
msd -r -sːu-r
hab -do -do-sːu
fut -dja -do-sːja
inf -du -du-sːu

that is generated using the folowing markdown3 code4:

|                      | `r add_gloss("AFF")` | `r add_gloss("NEG")` |
|----------------------|----------------------|----------------------|
| `r add_gloss("AOR")` | -∅                   | *-sːu*               |
| `r add_gloss("MSD")` | *-r*                 | *-sːu-r*             |
| `r add_gloss("HAB")` | *-do*                | *-do-sːu*            |
| `r add_gloss("FUT")` | *-dja*               | *-do-sːja*           |
| `r add_gloss("INF")` | *-du*                | *-du-sːu*            |

In the third section I show how to create a semi-automatic compiled abbreviation list for your document. As an example I provide the abbreviation list for this exact document. Even though the fut and msd gloss appears only in this exact section in the table above, it appears in the gloss lists presented in the third section. dattttt

3 Create semi-automatic compiled abbreviation list

After you finished your text, it is possible to call the make_gloss_list() function in order to automatically create a list of abbreviations.

make_gloss_list()

1sg — first person singular; 3 — third person; 3sg — third person singular; a — agent-like argument of canonical transitive verb; abs — absolutive; aff — affix; aor — aorist; aux — auxiliary; dat — dative; dattttt — ; dcl — declarative; fut — future; hab — habitual; imp — imperative; in — inessive; inf — infinitive; io — indirect object; ipf — imperfective; loc — locative; m — masculine; mir — mirative; msd — masdar; n — neuter; neg — negation; np — noun phrase; npl — ; npst — non-past; obl — oblique; pfv — perfective; pl — plural; pot — potential; prf — perfect; prfx — prefix; root — root; sbjv — subjunctive; sfx — suffix

This function works with the built-in dataset glosses_df that is compiled from Leipzig Glosses, Wikipedia page and articles from the open access journal Glossa5. Everybody can download and change this dataset for their own purposes.

It is possible that user can be not satisfied with the result of make_gloss_list() function, then there are two possible strategies. First strategy is to copy the result of the make_gloss_list(), modify it and paste in your rmarkdown document. Sometimes you work on some volume dedicated to on group of languages and you want to assure that glosses are the same across all articles, than you can compile your own table with columns gloss and definition and use it within make_gloss_list function. As you can see, all glosses specified in the my_abbreviations dataset changed their values in the output below:

my_abbreviations <- data.frame(gloss = c("NPST", "HAB", "INF", "NEG"),
                               definition = c("non-past tense", "habitual aspect", "infinitive", "negation marker"))
make_gloss_list(my_abbreviations)

1sg — first person singular; 3 — third person; 3sg — third person singular; a — agent-like argument of canonical transitive verb; abs — absolutive; aff — affix; aor — aorist; aux — auxiliary; dat — dative; dattttt — ; dcl — declarative; fut — future; hab — habitual aspect; imp — imperative; in — inessive; inf — infinitive; io — indirect object; ipf — imperfective; loc — locative; m — masculine; mir — mirative; msd — masdar; n — neuter; neg — negation marker; np — noun phrase; npl — ; npst — non-past tense; obl — oblique; pfv — perfective; pl — plural; pot — potential; prf — perfect; prfx — prefix; root — root; sbjv — subjunctive; sfx — suffix

Unfortunately, some glosses can have multiple meaning in different traditions (e. g. ass can be either associative plural or assertive mood). By default make_gloss_list() shows only some entries that were chosen by the package author. You can see all possibilities, if you add argument all_possible_variants = TRUE. As you can see, there are multiple possible values for aff, ass, imp and prf:

make_gloss_list(all_possible_variants = TRUE)

1sg — first person singular; 3 — third person; 3sg — third person singular; a — agent-like argument of canonical transitive verb; abs — absolutive; aff — affirmative; aff — affix; aor — aorist; ass — assertive; ass — associative; aux — auxiliary; dat — dative; dattttt — ; dcl — declarative; fut — future; hab — habitual; imp — imperative; imp — imperfect; imp — imperfective; imp — impersonal; in — inessive; inf — infinitive; io — indirect object; ipf — imperfective; loc — locative; m — masculine; mir — mirative; msd — masdar; n — neuter; neg — negation; np — noun phrase; npl — ; npst — non-past; obl — oblique; pfv — perfective; pl — plural; pot — potential; prf — perfect; prf — perfective; prfx — prefix; root — root; sbjv — subjunctive; sfx — suffix

You could have notice that problematic glosses (those without definition or duplicated) are colored. This can be switched off adding the argument annotate_problematic = FALSE:

make_gloss_list(all_possible_variants = TRUE, annotate_problematic = FALSE)

1sg — first person singular; 3 — third person; 3sg — third person singular; a — agent-like argument of canonical transitive verb; abs — absolutive; aff — affirmative; aff — affix; aor — aorist; ass — assertive; ass — associative; aux — auxiliary; dat — dative; dattttt — ; dcl — declarative; fut — future; hab — habitual; imp — imperative; imp — imperfect; imp — imperfective; imp — impersonal; in — inessive; inf — infinitive; io — indirect object; ipf — imperfective; loc — locative; m — masculine; mir — mirative; msd — masdar; n — neuter; neg — negation; np — noun phrase; npl — ; npst — non-past; obl — oblique; pfv — perfective; pl — plural; pot — potential; prf — perfect; prf — perfective; prfx — prefix; root — root; sbjv — subjunctive; sfx — suffix

4 Other output formats

Right now there is no direct way of knitting lingglosses to .docx format, however you can work around copying and pasting from the .html version:

The .pdf output is possible, however there are some known restrictions:

  • markdown bold and italic annotations do not work;
  • example numbers appears above the example;
  • there is no non-breaking space in glosses list.

So if you want to avoid those problems the best solution will be to use one of the latex glossing packages listed in the first footnote and the package glossaries for automatic compilation of glosses.

References

Arkadiev, P., and Y. Lander. 2020. “The Northwest Caucasian Languages.” In The Oxford Handbook of the Languages of the Caucasus, 369–446.
Comrie, B., M. Haspelmath, and B. Bickel. 2008. “The Leipzig Glossing Rules: Conventions for Interlinear Morpheme-by-Morpheme Glosses.”
DeLancey, S. 1997. “Mirativity: The Grammatical Marking of Unexpected Information.” Linguistic Typology 1 (1): 33–52.
Goldsmith, J. 1979. “The Aims of Autosegmental Phonology.” In Current Approaches to Phonological Theory, edited by D. A. Dinnsen, 202–22. Indiana University Press Bloomington, IN.
Maisak, T., and S. Tatevosov. 2007. “Beyond Evidentiality and Mirativity: Evidence from Tsakhur.” In L’Énonciation médiatisée II, 377–406.
Verhees, Samira. 2019. “General Converbs in Andi.” Studies in Language. International Journal Sponsored by the Foundation “Foundations of Language” 43 (1): 195–230.
Xie, Y., J. J. Allaire, and G. Grolemund. 2018. R Markdown: The Definitive Guide. CRC Press.

  1. If you want to render .pdf version you can either use latex and multiple linguistic packages developed for it (see e. g. gb4e, langsci, expex, philex), either you can render .html first and convert it to .pdf afterwards.↩︎

  2. It is also possible to use this tier for the annotation of words like here:
    HL H L H
    eze a za a
    np prfx root sfx
    ‘Eze swept… (Igbo, from (Goldsmith 1979: 209))’
    ↩︎
  3. The table generated with markdown is visualy poor. There is a lot of other ways to generate a table in R: kable() from knitr; kableExtra package, DT package and many others.↩︎

  4. It is easier to generate Markdown or Latex tables with Libre Office or MS Excel and then use some online transformation websites like https://www.tablesgenerator.com/.↩︎

  5. The script for the collecting glosses is available here. The glosses list was manually corrected and merged with glosses from other sources. This kind of glosses marked in the glosses_df dataset as lingglosses in the source column.↩︎